\section{NWA serialization}
\label{Se:serialization}

\newenvironment{grammar}{\begin{equation*}\begin{array}{lrl}}{\end{array}\end{equation*}}
\newcommand{\nonterm}[1]{\ensuremath{\langle}\textit{#1}\ensuremath{\rangle}}
\newcommand{\term}[1]{\textrm`\texttt{#1}\textrm'}
\newcommand{\cfgsp}{\ \ }


It is possible to serialize an NWA in two different supported
formats.

The first is a description language of our own creation, described in
most of the rest of this section. We also provide a parser for this format,
with minor caveats. The grammar that the parser uses is defined
formally in \sectref{nwa-grammar}.

The second is output in Graphviz Dot format. The output is natural. The color
of an edge denotes the type of a transition: black edges are internal
transitions, green edges are call transitions, and red edges are sets of return
transitions with the same exit site, return site, and symbol. Return
transitions are labeled with the symbol for its
transitions and the list of call predecessors.  Because it is possible to wind up with
extremely long key names (especially in a determinized NWA) that can
essentially destroy the ability of Dot to render useful output, by default
any state names longer than 20 characters are replaced by the numerical
value of the key. A call to \texttt{print\_dot} can specify different behavior, however.
All figures later in the document have been created via the Dot output (with
minimal hand editing in some cases).

There is a member function to produce each type of output:
\begin{functionlist}
  \functionDef{std::ostream\&}{Nwa::print}{std::ostream \& stream}{const}
    Writes the custom output described in this section
    to \texttt{stream}, returning \texttt{stream}.
    (This method is inherited from
    the \texttt{wali::Printable} abstract class.)

  \functionDefFirstEarlyNoCloseParen{std::ostream\&}{Nwa::print\_dot}{%
      \parbox[t][2\baselineskip][t]{3.5in}{
                 ostream \& os, std::string const \& n,\\
                 bool abbrev = true) const}}
    Outputs a Dot description of the NWA to the stream \texttt{os}, returning
    \texttt{os}.  The name of the graph (\texttt{digraph "n" \{...\}}) is
    given by \texttt{n}. Finally, if \texttt{abbrev} is set to
    \texttt{false}, the aforementioned abbreviation step above for state
    names is not done. Use this freedom at your peril.
\end{functionlist}

\subsection{Parser}
\label{Se:parser}

The NWA library contains a semi-experimental parser for descriptions of
NWAs. The format is described below, and matches the output of
\texttt{Nwa::print(std::ostream \&)}, provided that the result of
calling \texttt{key2str} on the state and symbol keys produces strings
that match the description of \nonterm{actual-names} in this grammar.

\emph{Note:} This is experimental code; in particular, the
error-handling in it is basically non-existent. We detect when the
input does not match the grammar, but we sometimes signal such input failures
with assertion violations. In a future version of the library, we will
report errors more gracefully, almost certainly through exceptions.

There are two functions that parse NWA descriptions, both declared in
\texttt{opennwa/NwaParser.hpp} in namespace \texttt{opennwa}:
\begin{functionlist}
  \functionDef{NwaRefPtr}{read\_nwa}{std::istream \& is, std::string * name = NULL}{} 
  Reads a description of a single NWA from \texttt{is} and
  returns it. The serialization format allows an optional name for an NWA; if
  one is specified and \texttt{name} is non-null, then the name is stored in
  the string pointed to by \texttt{name}.

  \functionDefFirst{std::map$<$std::string, NwaRefPtr$>$}%
                   {read\_nwa\_proc\_set}{std::istream \&}{}
  Reads the entire input stream,
  returning a map from the name of each automaton description to the parsed
  NWA. (This form was originally created to read NWAs for an entire program
  formatted with each procedure in a separate NWA. The name of each NWA was
  the name of the procedure in the original program.) The return type
  is \texttt{typedef}ed to the name \texttt{ProcedureMap},
  defined in the same header and namespace.
\end{functionlist}


\subsection{Examples}

\figrefs{nwa-simple}{nwa-mattf} show examples of the NWA serialization format, illustrating
both the general form and some of the flexibilities in what precise
format is accepted.

\begin{figure}
  \centering
  \begin{tabular}{p{2.5in}}
    \verbatiminput{Figures/figure-1.nwa}
  \end{tabular}
  \caption{Serialized form of the NWA depicted in \figref{Example1}.}
  \label{Fi:nwa-simple}
\end{figure}

\begin{figure}
  \centering
  \begin{tabular}{p{6in}}
    \verbatiminput{Figures/star-simple-result.nwa}
  \end{tabular}
  \caption{Serialized form of the NWA depicted in \figref{Star2}.}
  \label{Fi:nwa-mattf}
\end{figure}

\subsection{NWA description format}
\label{Se:nwa-grammar}

This section describes the grammar of the file format.

\emph{Note:} Some characteristics in the description below have the
following phrase as part of their description: ``you should not rely on
this.'' In such cases, we reserve the right to change this behavior in
future versions. %and to not feel bad about it.

%\emph{Note:} The NWA parser has unit tests embedded in the code. (It
%does not use any unit-test framework.) These tests can be run by
%calling \texttt{opennwa::parser\_test\_all}. The file
%\texttt{NwaParserMain.cpp} is a short program that does this.


The grammar for an NWA is as follows. In order to accept a couple of
slightly different output formats we have used in the past, there are
some choices in whether braces are present and other aspects.


A single input stream can contain either a single NWA (just a series
of blocks) or multiple NWAs. If you would like to describe multiple
NWAs, each individual starts with the literal \texttt{nwa}.


\begin{grammar}
  \nonterm{nwa-description} & ::= & \left(\term{nwa}\cfgsp\nonterm{name}?\cfgsp\term{:}?\right)?\cfgsp\term{\{}?\cfgsp\nonterm{block}{+}\cfgsp\term{\}}?
\end{grammar}

\emph{Braces are required if ``nwa'' is present and the name is absent}
 in order to
 distinguish the first block header (\texttt{Q:}, \texttt{Q0:},
 or \texttt{Qf:}) from the name of the NWA.

An NWA is a sequence of blocks; each block specifies one or more
states, symbols, or transitions.

\begin{grammar}
  \nonterm{block} &  ::= & \nonterm{state-block} \\
                  &    | & \nonterm{sigma-block} \\
                  &    | & \nonterm{delta-block}
\end{grammar}

There can be more than one block of a given type; e.g. it is perfectly fine to
specify all transitions in one block (as in \figref{nwa-mattf}),
one block per transition (as in \figref{nwa-simple}), or
any mixture. The ``block header'' specifies what kind of block
it is, and the body is a comma-separated list of whatever entity the
block header specifies (e.g., states). The body may be surrounded by
curly braces, but they are not required unless the body is empty.

State blocks can specify states, initial states, or accepting states;
these are denoted by
the block headers \texttt{Q:}, \texttt{Q0:}, and \texttt{Qf:},
respectively.
 
\begin{grammar}
  \nonterm{state-block} &  ::= & \term{Q:} \cfgsp \nonterm{name-list} \\
                        &   |  & \term{Q0:}\cfgsp \nonterm{name-list} \\
                        &   |  & \term{Qf:}\cfgsp \nonterm{name-list}
\end{grammar}

(It is not necessary to explicitly list all states; if a state is
listed in a transition, it is implicitly added to the machine as
needed. Thus, it is quite reasonable to have a machine description with
no \texttt{Q:} blocks.)

A name-list is simply a comma-separate list of names of states.

\begin{grammar}
  \nonterm{name-list} & ::= & \term\{? \cfgsp (\nonterm{name}\cfgsp (\term{,}\cfgsp  \nonterm{name})*)?\cfgsp\term\}?
\end{grammar}

Recall that if the list is empty, it must be followed by a \texttt{\}}
(or EOF) to distinguish the next block header from the name of a
state.

The grammar for \nonterm{name} will be specified below.


Symbol blocks are just like name blocks, except that they specify symbol
names. The block header is simply 'sigma'.

\begin{grammar}
  \nonterm{sigma-block} & ::= & \term{sigma:}\cfgsp  \nonterm{name-list}
\end{grammar}

A transition block can list internal, call, or return transitions,
denoted by the block headers \texttt{Delta\_i}, \texttt{Delta\_c}, and
\texttt{Delta\_r}, respectively.
 
\begin{grammar}
  \nonterm{delta-block} & ::= & \term{Delta\_i:}\cfgsp  \nonterm{triple-list} \\
                        &   | & \term{Delta\_c:}\cfgsp  \nonterm{triple-list} \\
                        &   | & \term{Delta\_r:}\cfgsp  \nonterm{quad-list}
\end{grammar}

The bodies in each case are simply a comma-separated list of triples
or quads (like \nonterm{name-list}, if these are empty, they must be followed
by \term{\}}):

\begin{grammar}
  \nonterm{triple-list} & ::= & \term\{?\cfgsp(\nonterm{triple}\cfgsp  (\term{,}\cfgsp  \nonterm{triple})*)?\cfgsp\term\}? \\
  \nonterm{quad-list}   & ::= & \term\{?\cfgsp(\nonterm{quad}\cfgsp  (\term{,}\cfgsp  \nonterm{quad})*)?\cfgsp\term\}?
\end{grammar}

and each triple (resp., quad) is a 3-tuple (4-tuple) of names:

\begin{grammar}
  \nonterm{triple} & ::= &  \term{(}\cfgsp  \nonterm{name}\cfgsp \term{,}\cfgsp  \nonterm{name}\cfgsp  \term{,}\cfgsp  \nonterm{name}\cfgsp  \term{)} \\
  \nonterm{quad} & ::= &  \term{(}\cfgsp  \nonterm{name}\cfgsp \term{,}\cfgsp  \nonterm{name}\cfgsp  \term{,}\cfgsp  \nonterm{name}\cfgsp  \term{,}\cfgsp  \nonterm{name}\cfgsp  \term{)}
\end{grammar}


All that remains is to define the grammar for `name'. Because the
format grew organically and we wished to keep compatibility with
existing log files (which could previously not be automatically parsed),
the `name' terminal is a bit convoluted.

The name consists of the actual name of the state or symbol
in question, followed by an optional parenthesis-delimited token that
is ignored. (In the output of \texttt{print()}, the \nonterm{name} is the result of
\texttt{key2str} and the optional token is the actual numeric value of
the key.)

Before we discuss the grammar of \nonterm{name}, the following shows
some examples of strings that are names:
\begin{quote}
\begin{tabular}{p{1.5in}p{1.5in}p{1.5in}}
  \tt 123         & \tt xyz          & \tt a2y          \\
  \tt 123 (=4)    & \tt xyz (=42)    & \tt a2y (=o3oth) \\
  \tt (a,b)(x,y)  & \tt (a, b) (=32) & \tt $<$1, 2$>$
\end{tabular}
\end{quote}
and strings which are not names:
\begin{quote}
\begin{tabular}{p{1.5in}p{1.5in}p{1.5in}}
  \tt hello world & \tt hello world (=32)  & \tt (unmatched
\end{tabular}
\end{quote}

The actual grammar is:
\begin{grammar}
  \nonterm{name} & ::= &  \nonterm{actual-name}\cfgsp  \nonterm{dummy-token}? \\
  \nonterm{dummy-token} & ::= & \term{(}\cfgsp(\neg\{\term{(}\})*\cfgsp\term{)}
\end{grammar}

The $\nonterm{actual-name}$ portion generally behaves like a standard
programming-language identifier, but with one important difference: it can
contain
balanced parentheses, and \emph{any} characters are allowed within a
set of parentheses. (We actually allow a larger set of characters than
is typical, but you should not rely on this. Surround your names
with parentheses if you use ``special'' characters.)

We can now define the grammar of \term{actual-name}.

\begin{grammar}
  \nonterm{actual-name} & ::= &  \nonterm{unit}+ \\
&& \\
  \nonterm{unit} & ::= &  \nonterm{normal-char} \\
                 &   | &  \nonterm{paren-group} \\
&& \\
  \nonterm{normal-char}& ::= &  \textrm{character other than whitespace, \term{,}, or a paren}
\end{grammar}

(\texttt{std::isspace} is used to determine whether a character is
  whitespace.)

\begin{grammar}
  \nonterm{left-paren}  & ::= &  \term{(}\cfgsp |\cfgsp \term{\{}\cfgsp |\cfgsp \term{[}\cfgsp |\cfgsp \term{$<$} \\
  \nonterm{right-paren} & ::= &  \term{)}\cfgsp |\cfgsp \term{\}}\cfgsp |\cfgsp \term{]}\cfgsp |\cfgsp \term{$>$} \\
  \nonterm{non-paren} & ::= &  \textrm{a character other than any of the above} \\
&& \\
  \nonterm{paren-group}& ::= &  \nonterm{left-paren}\cfgsp  (\nonterm{non-paren}\ |\ \nonterm{paren-group})*\cfgsp  \nonterm{right-paren}
\end{grammar}

(Finally, we do not currently demand that the \emph{types} of parens
  match up: e.g., \texttt{(1, 2, 3]} is a valid name. You should not
    rely on this feature.)
